Applying Finite-State Methods to the Swahili Language
نویسندگان
چکیده
Herein, we explore the current finite-state methods that exist for analyzing English grammar and decide whether they can be applied to the Swahili language and Swahili syntactic patterns. Further, we to explore the differences between Swahili grammar and English grammar to see if it is possible to accommodate these finite-state methods to the Swahili language. In the end, the objective is to deliver a recognition device for identifying well-formed Swahili sentences and generating new Swahili strings that have their components (i.e. noun phrase, prepositional phrases, etc) enclosed in parentheses. To the best of our knowledge, no work has ever been done on Swahili language processing. Also, the structure of Swahili sentences is substantially different than that of English sentences. One example is the fact that there are no equivalent words for the English determiners (DET), “the” and “a”, in Swahili. We hope that our work can contribute to developing better Swahili to English and English to Swahili dictionaries, translators, Swahili grammar checkers, speech recognition devices, and the like. We anticipate, our work in this field can bridge the gap between native Swahili speakers and native English speakers and lead to approaches for other dialects as well.
منابع مشابه
The Form and Interpretation of Finite and non-Finite Verbs in Swahili
A great deal is known about the distribution of finite and non-finite forms in early language and increasingly, studies are investigating the semantic properties of these different forms. An important question for acquisition theory concerns the relationship between the child's developing morphosyntax and the semantics typically expressed by these structures. In this paper we will explore the f...
متن کاملSYNERGY: A Named Entity Recognition System for Resource-scarce Languages such as Swahili using Online Machine Translation
Developing Named Entity Recognition (NER) for a new language using standard techniques requires collecting and annotating large training resources, which is costly and time-consuming. Consequently, for many widely spoken languages such as Swahili, there are no freely available NER systems. We present here a new technique to perform NER for new languages using online machine translation systems....
متن کاملThe morphosyntax of mood in early grammar with special reference to Swahili
In this paper we explore the development of the morphosyntax-semantics interface by comparing development in 4 typologically diverse languages: Dutch (a Germanic V2 language), Greek, Italian (a Romance pro-drop language) and Swahili (a Bantu language), with particular emphasis on Swahili, a relatively understudied language whose morphosyntactic structure is particularly relevant to the question...
متن کاملHFST - Framework for Compiling and Applying Morphologies
HFST–Helsinki Finite-State Technology (hfst.sf.net) is a framework for compiling and applying linguistic descriptions with finite-state methods. HFST currently connects some of the most important finite-state tools for creating morphologies and spellers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical inf...
متن کاملWord-Level Language Identification and Predicting Codeswitching Points in Swahili-English Language Data
Codeswitching is a very common behavior among Swahili speakers, but of the little computational work done on Swahili, none has focused on codeswitching. This paper addresses two tasks relating to Swahili-English codeswitching: word-level language identification and prediction of codeswitch points. Our two-step model achieves high accuracy at labeling the language of words using a simple feature...
متن کامل